AITopics | expert-supervised reinforcement learning

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Neural Information Processing SystemsDec-24-2025, 18:11:22 GMT

Offline Reinforcement Learning (RL) is a promising approach for learning optimal policies in environments where direct exploration is expensive or unfeasible. However, the adoption of such policies in practice is often challenging, as they are hard to interpret within the application context, and lack measures of uncertainty for the learned policy value and its decisions. To overcome these issues, we propose an Expert-Supervised RL (ESRL) framework which uses uncertainty quantification for offline policy learning. In particular, we have three contributions: 1) the method can learn safe and optimal policies through hypothesis testing, 2) ESRL allows for different levels of risk averse implementations tailored to the application context, and finally, 3) we propose a way to interpret ESRL's policy at every state through posterior distributions, and use this framework to compute off-policy value function posteriors. We provide theoretical guarantees for our estimators and regret bounds consistent with Posterior Sampling for RL (PSRL). Sample efficiency of ESRL is independent of the chosen risk aversion threshold and quality of the behavior policy.

expert-supervised reinforcement learning, name change, offline policy learning and evaluation, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Neural Information Processing SystemsAug-16-2025, 19:21:37 GMT

With increasing success in reinforcement learning (RL), there is broad interest in applying these methods to real-world settings. This has brought exciting progress in offline RL and off-policy policy evaluation (OPPE).

behavior policy, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > Canada (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.67)

Industry:

Health & Medicine > Therapeutic Area (0.47)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Review for NeurIPS paper: Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Neural Information Processing SystemsFeb-7-2025, 02:15:03 GMT

Weaknesses: The empirically evaluation misses relevant baselines, making it quite hard to evaluate the usefulness of ESRL in comparison to prior approaches. The main algorithm (Algo 1) incorporates the use of majority voting and hypothesis testing in addition to learning multiple Q-estimates based on K sampled MDPs. Furthermore, based on the figure captions, K seems to be large (250 for Riverswim, 500 for Sepsis) and it seems unfair to use a single DQN model. A *naive* baseline would be to use the ensemble of these K Q-estimates and simply use their mean for selecting actions: this *quantifies* the empirical benefit from hypothesis testing. This should be discussed in the paper as well as empirically compared to as should be made as this is a simple way to incorporate value uncertainty in offline RL. 3. As mentioned in the paper, ESRL can deviate from the behavior policy when required or stick to it depending on the hypothesis testing.

arxiv preprint arxiv, expert-supervised reinforcement learning, offline policy learning and evaluation, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.77)

Add feedback

Review for NeurIPS paper: Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Neural Information Processing SystemsFeb-7-2025, 01:47:32 GMT

This paper proposes an interesting way to use hypothesis testing as a solution to use expert knowledge for offline RL. The proposed approach is exciting and good enough to be published at NeurIPS. The experimental results are interesting, as well. However, the authors should address the concerns on the presentation and theoretical results raised by Reviewer 1 in the camera-ready version of the paper. At the very least, discussing it is the limitation of the approach in the paper's conclusion.

expert-supervised reinforcement learning, neurips paper, offline policy learning and evaluation

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Neural Information Processing SystemsOct-11-2024, 13:06:05 GMT

Offline Reinforcement Learning (RL) is a promising approach for learning optimal policies in environments where direct exploration is expensive or unfeasible. However, the adoption of such policies in practice is often challenging, as they are hard to interpret within the application context, and lack measures of uncertainty for the learned policy value and its decisions. To overcome these issues, we propose an Expert-Supervised RL (ESRL) framework which uses uncertainty quantification for offline policy learning. In particular, we have three contributions: 1) the method can learn safe and optimal policies through hypothesis testing, 2) ESRL allows for different levels of risk averse implementations tailored to the application context, and finally, 3) we propose a way to interpret ESRL's policy at every state through posterior distributions, and use this framework to compute off-policy value function posteriors. We provide theoretical guarantees for our estimators and regret bounds consistent with Posterior Sampling for RL (PSRL). Sample efficiency of ESRL is independent of the chosen risk aversion threshold and quality of the behavior policy.

application context, expert-supervised reinforcement learning, offline policy learning and evaluation, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

W, Aaron Sonabend, Lu, Junwei, Celi, Leo A., Cai, Tianxi, Szolovits, Peter

arXiv.org Artificial IntelligenceJun-23-2020

Offline Reinforcement Learning (RL) is a promising approach for learning optimal policies in environments where direct exploration is expensive or unfeasible. However, the adoption of such policies in practice is often challenging, as they are hard to interpret within the application context, and lack measures of uncertainty for the learned policy value and its decisions. To overcome these issues, we propose an Expert-Supervised RL (ESRL) framework which uses uncertainty quantification for offline policy learning. In particular, we have three contributions: 1) the method can learn safe and optimal policies through hypothesis testing, 2) ESRL allows for different levels of risk aversion within the application context, and finally, 3) we propose a way to interpret ESRL's policy at every state through posterior distributions, and use this framework to compute off-policy value function posteriors. We provide theoretical guarantees for our estimators and regret bounds consistent with Posterior Sampling for RL (PSRL) that account for any risk aversion threshold. We further propose an offline version of PSRL as a special case of ESRL.

machine learning, posterior distribution, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2006.13189

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

expert-supervised reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Review for NeurIPS paper: Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Review for NeurIPS paper: Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation

Expert-Supervised Reinforcement Learning for Offline Policy Learning and Evaluation